Skip to content

fix: index embedded git repos hidden by the super-repo .gitignore#623

Open
kaito1337 wants to merge 1 commit into
colbymchenry:mainfrom
kaito1337:fix/index-gitignored-embedded-repos
Open

fix: index embedded git repos hidden by the super-repo .gitignore#623
kaito1337 wants to merge 1 commit into
colbymchenry:mainfrom
kaito1337:fix/index-gitignored-embedded-repos

Conversation

@kaito1337
Copy link
Copy Markdown

Closes #622.

Problem

When a workspace is itself a git repo whose own .gitignore excludes the nested
clones it holds, codegraph indexes none of that nested source. The
embedded-repo recursion from #193 only fires for repos that surface as
untracked entries (git ls-files -o --exclude-standard), so a nested repo the
super-repo ignores stays invisible — as does a plain wrapper dir (e.g.
services/) that merely holds sibling clones.

Real-world: a 17-service Go monorepo (go.work, each services/* / shared/
its own clone, all excluded by the root .gitignore to keep git status clean)
indexed 3 files / 0 symbols.

Fix

src/extraction/index.ts:

  • New pass over ignored directories — git ls-files -o -i --exclude-standard --directory — recursing into any entry that is, or contains, a git repo
    (findEmbeddedReposUnder + collectIgnoredEmbeddedRepos). Wrapper dirs are
    descended a bounded depth; DEFAULT_IGNORE_DIRS and dot-dirs are skipped.
  • Files from these repos are filtered only by the built-in defaults
    (node_modules, …), not the super-repo's .gitignore — that exclusion is
    the parent's tracking hygiene. Each embedded repo still honors its own
    .gitignore inside collectGitFiles.

Consistent with the intent of #193 (embedded repos) and #147 (submodules):
nested project boundaries get indexed; the only gap was the ignored ones.

Tests

3 regression tests added to the existing Nested non-submodule git repos block:
ignored embedded repo, ignored wrapper dir with sibling repos, and a hidden
repo's own .gitignore still being honored.

npx vitest run __tests__/extraction.test.ts -t "Nested non-submodule"   # 5 passed

tsc build clean; the gitignore/submodule/embedded suites still pass.

Verified on the real monorepo

The patched scanner takes the layout above from 3 → 658 source files
(516 services/, 139 shared/, 616 .go) without any change to its
.gitignore
.

Known follow-up (out of scope)

Incremental sync via git status --porcelain on the super-repo won't see edits
inside an ignored embedded repo; the filesystem watcher backstops live updates
and a full re-index always picks them up. Flagged in #622 for a separate look.

collectGitFiles only recurses into embedded (non-submodule) repos that
surface as untracked entries via `git ls-files -o --exclude-standard`.
When the super-repo's own .gitignore excludes a nested clone — common in
workspaces that hold independent repos and ignore them to keep their own
`git status` clean — the repo never appears there, so none of its source
is indexed. Wrapper dirs (a plain dir holding sibling clones) are missed
for the same reason.

Add a pass over ignored directories (`git ls-files -o -i
--exclude-standard --directory`) and recurse into any that is, or
contains, a git repo. Files from these repos are filtered only by the
built-in defaults (node_modules, ...), not the super-repo's .gitignore:
that exclusion is the parent's tracking hygiene, and each embedded repo
still honors its OWN .gitignore inside collectGitFiles.

Complements the untracked-embedded-repo handling from colbymchenry#193.

Closes colbymchenry#622.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedded git repos excluded by the super-repo's .gitignore are never indexed

1 participant